09. Accessing Elements in pandas DataFrames
Accessing Elements in Pandas DataFrames
Pandas 5 V1
We can access elements in Pandas DataFrames in many different ways. In general, we can access rows, columns, or individual elements of the DataFrame by using the row and column labels. We will use the same
store_items
DataFrame created in the previous lesson. Let's see some examples:
# We print the store_items DataFrame
print(store_items)
# We access rows, columns and elements using labels
print()
print('How many bikes are in each store:\n', store_items[['bikes']])
print()
print('How many bikes and pants are in each store:\n', store_items[['bikes', 'pants']])
print()
print('What items are in Store 1:\n', store_items.loc[['store 1']])
print()
print('How many bikes are in Store 2:', store_items['bikes']['store 2'])
** bikes** glasses pants watches store 1 20 NaN 30 35 store 2 15 50.0 5 10
How many bikes are in each store:
** bikes** store 1 20 store 2 15
How many bikes and pants are in each store:
** bikes** pants store 1 20 30 store 2 15 5
What items are in Store 1:
** bikes** glasses pants watches store 1 20 NaN 30 35
How many bikes are in Store 2: 15
It is important to know that when accessing individual elements in a DataFrame, as we did in the last example above, the labels should always be provided with the column label first, i.e. in the form
dataframe[column][row]
. For example, when retrieving the number bikes in store 2, we first used the column label
bikes
and then the row label
store 2
. If you provide the row label first you will get an error.
We can also modify our DataFrames by adding rows or columns. Let's start by learning how to add new columns to our DataFrames. Let's suppose we decided to add
shirts
to the items we have in stock at each store. To do this, we will need to add a new column to our
store_items
DataFrame indicating how many shirts are in each store. Let's do that:
# We add a new column named shirts to our store_items DataFrame indicating the number of
# shirts in stock at each store. We will put 15 shirts in store 1 and 2 shirts in store 2
store_items['shirts'] = [15,2]
# We display the modified DataFrame
store_items
** bikes** glasses pants watches shirts store 1 20 NaN 30 35 15 store 2 15 50.0 5 10 2
We can see that when we add a new column, the new column is added at the end of our DataFrame.
We can also add new columns to our DataFrame by using arithmetic operations between other columns in our DataFrame. Let's see an example:
# We make a new column called suits by adding the number of shirts and pants
store_items['suits'] = store_items['pants'] + store_items['shirts']
# We display the modified DataFrame
store_items
** bikes** glasses pants watches shirts suits store 1 20 NaN 30 35 15 45 store 2 15 50.0 5 10 2 7
Suppose now, that you opened a new store and you need to add the number of items in stock of that new store into your DataFrame. We can do this by adding a new row to the
store_items
Dataframe. To add rows to our DataFrame we first have to create a new Dataframe and then append it to the original DataFrame. Let's see how this works
# We create a dictionary from a list of Python dictionaries that will number of items at the new store
new_items = [{'bikes': 20, 'pants': 30, 'watches': 35, 'glasses': 4}]
# We create new DataFrame with the new_items and provide and index labeled store 3
new_store = pd.DataFrame(new_items, index = ['store 3'])
# We display the items at the new store
new_store
** bikes** glasses pants watches store 3 20 4 30 35
We now add this row to our
store_items
DataFrame by using the
.append()
method.
# We append store 3 to our store_items DataFrame
store_items = store_items.append(new_store)
# We display the modified DataFrame
store_items
** bikes** glasses pants shirts suits watches store 1 20 NaN 30 15.0 45.0 35 store 2 15 50.0 5 2.0 7.0 10 store 3 20 4.0 30 NaN NaN 35
Notice that by appending a new row to the DataFrame, the columns have been put in alphabetical order.
We can also add new columns of our DataFrame by using only data from particular rows in particular columns. For example, suppose that you want to stock stores 2 and 3 with new watches and you want the quantity of the new watches to be the same as the watches already in stock for those stores. Let's see how we can do this
# We add a new column using data from particular rows in the watches column
store_items['new watches'] = store_items['watches'][1:]
# We display the modified DataFrame
store_items
** bikes** glasses pants shirts suits watches new watches store 1 20 NaN 30 15.0 45.0 35 NaN store 2 15 50.0 5 2.0 7.0 10 10.0 store 3 20 4.0 30 NaN NaN 35 35.0
It is also possible, to insert new columns into the DataFrames anywhere we want. The
dataframe.insert(loc,label,data)
method allows us to insert a new column in the
dataframe
at location
loc
, with the given column
label
, and given
data
. Let's add new column named
shoes
right before the
suits
column. Since
suits
has numerical index value 4 then we will use this value as
loc
. Let's see how this works:
# We insert a new column with label shoes right before the column with numerical index 4
store_items.insert(4, 'shoes', [8,5,0])
# we display the modified DataFrame
store_items
** bikes** glasses pants shirts shoes suits watches new watches store 1 20 NaN 30 15.0 8 45.0 35 NaN store 2 15 50.0 5 2.0 5 7.0 10 10.0 store 3 20 4.0 30 NaN 0 NaN 35 35.0
Just as we can add rows and columns we can also delete them. To delete rows and columns from our DataFrame we will use the
.pop()
and
.drop()
methods. The
.pop()
method only allows us to delete columns, while the
.drop()
method can be used to delete both rows and columns by use of the
axis
keyword. Let's see some examples
# We remove the new watches column
store_items.pop('new watches')
# we display the modified DataFrame
store_items
** bikes** glasses pants shirts shoes suits watches store 1 20 NaN 30 15.0 8 45.0 35 store 2 15 50.0 5 2.0 5 7.0 10 store 3 20 4.0 30 NaN 0 NaN 35
# We remove the watches and shoes columns
store_items = store_items.drop(['watches', 'shoes'], axis = 1)
# we display the modified DataFrame
store_items
** bikes** glasses pants shirts suits store 1 20 NaN 30 15.0 45.0 store 2 15 50.0 5 2.0 7.0 store 3 20 4.0 30 NaN NaN
# We remove the store 2 and store 1 rows
store_items = store_items.drop(['store 2', 'store 1'], axis = 0)
# we display the modified DataFrame
store_items
** bikes** glasses pants shirts suits store 3 20 4.0 30 NaN NaN
Sometimes we might need to change the row and column labels. Let's change the
bikes
column label to
hats
using the
.rename()
method
# We change the column label bikes to hats
store_items = store_items.rename(columns = {'bikes': 'hats'})
# we display the modified DataFrame
store_items
** hats** glasses pants shirts suits store 3 20 4.0 30 NaN NaN
Now let's change the row label using the
.rename()
method again.
# We change the row label from store 3 to last store
store_items = store_items.rename(index = {'store 3': 'last store'})
# we display the modified DataFrame
store_items
** hats** glasses pants shirts suits last store 20 4.0 30 NaN NaN
You can also change the index to be one of the columns in the DataFrame.
# We change the row index to be the data in the pants column
store_items = store_items.set_index('pants')
# we display the modified DataFrame
store_items
pants ** hats** glasses shirts suits 30 20 4.0 NaN NaN